Chapter 2: More Image Manipulation¶
Implementing Color Pop Effects¶
The color pop effect, also known as selective color or color splash, is a photographic or graphic design technique where a specific part of an image is highlighted by retaining its original color while the rest of the image is converted to grayscale or desaturated. This creates a striking visual impact by drawing attention to the colored element against a black-and-white or muted background.
Color pop effect refers to making certain colors in an image more vibrant while desaturating the rest of the image. The following function apply_color_pop() pops th red color, so the colors of the buses are only retained. Hue values $0\leq h \leq 10$ and $170 \leq h \leq 180$ are used to create the mask containing the red values. The next figure shows the output image obtained after applying the visual effect.
%matplotlib inline
import warnings
warnings.filterwarnings('ignore')
from PIL import Image, ImageFont, ImageDraw, ImageFilter, ImageOps
from PIL.ImageChops import add, subtract, multiply, difference, screen, soft_light, hard_light, overlay, lighter
import PIL.ImageStat as stat
from pillow_lut import load_cube_file
from skimage import color, exposure, img_as_float, data
from skimage.io import imread, imsave, imshow, show, imread_collection, imshow_collection
from skimage.transform import SimilarityTransform, PiecewiseAffineTransform, warp, swirl, rescale, resize, downscale_local_mean
from skimage.util import view_as_blocks, invert, random_noise, montage
from skimage.color import rgb2gray, gray2rgb
from skimage import measure
import scipy.ndimage as ndimage
from scipy.ndimage import affine_transform
import cv2
import wand
import blend_modes #, imgviz
import numpy as np
import matplotlib.image as mpimg
import matplotlib.pylab as plt
from time import time
import os
def apply_color_pop(im):
gray = rgb2gray(im)
gray = np.stack((gray,)*3, axis=-1)
hsv = color.rgb2hsv(im)
mask_cond = (hsv[...,0] <= 10/180) | ((hsv[...,0] >= 170/180))
gray[mask_cond] = im[mask_cond]
return gray
# Load an image and apply the color pop effect
im = imread('images/bus.jpg')
im = im/im.max()
out_im = apply_color_pop(im)
plt.figure(figsize=(20,7))
plt.imshow(np.hstack((im, out_im))), plt.axis('off')
plt.show()
Correcting a Fish-eye distorted Image¶
Fisheye distortion, also known as barrel distortion, is a type of geometric distortion that occurs in images taken with fisheye lenses. Fisheye lenses are ultra-wide-angle lenses that capture a significantly wider field of view compared to regular lenses. This wide field of view is achieved by using a special optical design that allows light to enter the camera from a very wide angle.
The distortion characteristic of fisheye lenses is such that straight lines in the scene, especially those near the edges of the image, appear curved. In barrel distortion, straight lines that should be straight in the image are instead curved outward, creating a barrel-like or fishbowl effect. This distortion is more pronounced towards the edges of the image and becomes less noticeable towards the center.
There are different types of fisheye distortion, including equidistant, stereographic, and orthographic projections, each producing a different visual effect. While fisheye distortion can be undesirable in some cases, it is intentionally used in others to create artistic or unique perspectives in photography and videography.
In image processing, we aim to correct or minimize fisheye distortion. This process is known as fisheye correction or rectification, where algorithms are applied to correct the geometric distortions and restore the straightness of lines in the image. Fisheye correction is commonly used in applications such as panoramic photography, computer vision, and virtual reality to ensure accurate representation of the scene.
The next code snippet implements a function undistort() which implements the fisheye-distortion correction process (reference: https://tannerhelland.com/2013/02/11/simple-algorithm-correcting-lens-distortion.html). The use of simple bilinear resampling blurs the output slightly.
The algorithm works just fine without the zoom parameter. The zoom parameter is added after some experimentation; specifically, it is useful in two ways:
- On images with only minor lens distortion, zooming out reduces stretching artifacts at the edges of the corrected image
- On images with severe distortion, such as true fish-eye photos, zooming-out retains more of the source material
def undistort(im, zoom = 1):
w, h = im.shape[:2]
im_new = np.zeros_like(im)
hw, hh = w / 2, h / 2
strength = 5 #0.00001
correction_radius = np.sqrt(w ** 2 + h ** 2) / strength
for x in range(w):
for y in range(h):
new_x, new_y = x - hw, y - hh
distance = np.sqrt(new_x**2 + new_y**2)
r = distance / correction_radius
theta = 1 if r == 0 else np.arctan(r) / r
source_x, source_y = hw + theta * new_x * zoom, hh + theta * new_y * zoom
#set color of pixel (x, y) to color of source image pixel at (source_x, source_y)
im_new[x, y] = im[round(source_x), round(source_y)]
return im_new
distorted_image = imread('images/distorted.jpg')
undistorted_image = undistort(distorted_image)
# plot the images here
plt.figure(figsize=(12,5))
plt.imshow(np.hstack((distorted_image, undistorted_image))), plt.axis('off')
plt.show()
As can be seen from the above figure, the undistort() function removes the fisheye distortion from the input image to a great extent.
Image Manipulations with scikit-image¶
As done above using the library PIL, we can also use the scikit-image library functions for
image manipulation, some examples are shown below.
Inverse warping and Geometric transformation using the warp () function¶
scikit-image transform module's warp() function can be used for inverse warping for geometric
transformation of an image (discussed in a previous section), as demonstrated in the
following examples.
Applying translation on an image¶
For all possible pixels in translated output image $(x,y)$, the corresponding points in the input image $(u, v)$ can be found with the following equations:
$$ u=x-t_x $$$$ v=y-t_y $$where $t_x$ and $t_y$ demotes translations along x and y axis, respectively (and $x(u,v) = x$,
$y(u,v)=y$). The following code block shows how to use the warp() function to translate an
image.
def translate(xy, t_x, t_y):
xy[:, 0] -= t_y
xy[:, 1] -= t_x
return xy
im = imread('images/monalisa.jpg')
im_trans = warp(im, translate, map_args={'t_x':50, 't_y':-100}) # create a dictionary for translation parameters
Applying rotation to an image¶
If an image is rotated by angle $\theta$ counterclockwise, for all possible pixels in rotated output image $(x,y)$, the corresponding points in the input image $(u, v)$ can be found with the following equations:
$$ u=x.cos \theta -y.sin \theta $$$$ v=x.sin \theta +y.cos \theta $$The following code block shows how to use the warp() function to rotate an image.
def rotate(xy, theta):
theta = np.pi/180*theta
xy[:, 0], xy[:, 1] = xy[:, 0]*np.cos(theta) - xy[:,1]*np.sin(theta), xy[:, 0]*np.sin(theta) + xy[:,1]*np.cos(theta)
return xy
im_rot = warp(im, rotate, map_args={'theta':10}) # create a dictionary for rotation parameters
Applying an Affine transformation to an image¶
We can use the function SimilarityTransform() to compute the transformation matrix,
followed by warp() function, to carry out the transformation, as shown in the next code
block.
tform = SimilarityTransform(scale=0.9, rotation=np.pi/4,translation=(im.shape[0]/2, -100))
im_sim = warp(im, tform)
Applying the Swirl transform¶
This is a non-linear transform and defined by the documentation as follows. The swirl transformation Consider the coordinate $(x,y)$ in the output image. The reverse mapping for the swirl transformation first computes, relative to a center ($x_0,y_0$), its polar coordinates $\rho, \theta$, and then transforms them according to the following:

where strength is a parameter for the amount of swirl, radius indicates the swirl extent in
pixels, and rotation adds a rotation angle. The transformation of radius into $r$ is to ensure
that the transformation decays to $≈1/1000^{th}$ within the specified radius. Here we shall use the swirl() function from the skimage.transform module.
The next figure shows the output images by applyting different tranforms using the above code snippet, the input parrot image.
im_swirled = swirl(im, rotation=0, strength=60, radius=300)
plt.figure(figsize=(9,4))
plt.subplot(141), plt.imshow(im_trans), plt.axis('off'), plt.title('transalation', size=12)
plt.subplot(142), plt.imshow(im_rot), plt.axis('off'), plt.title('rotation', size=12)
plt.subplot(143), plt.imshow(im_sim), plt.axis('off'), plt.title('similarity transform', size=12)
plt.subplot(144), plt.imshow(im_swirled), plt.axis('off'), plt.title('swirl transform', size=12)
plt.suptitle('geometric transformation with warp()', size=13)
plt.tight_layout()
plt.show()
Applying Piecewise Affine Transformation¶
A piecewise affine transformation is a type of geometric transformation that divides an image into smaller regions and applies an affine transformation independently to each region. This allows for more flexible and localized transformations compared to a global affine transformation.
In scikit-image, you can use the PiecewiseAffineTransform class to implement piecewise affine transformations. Here's a step-by-step guide on how to do it:
rows, cols = im.shape[0], im.shape[1]
src_cols = np.linspace(0, cols, 20)
src_rows = np.linspace(0, rows, 10)
src_rows, src_cols = np.meshgrid(src_rows, src_cols)
src = np.dstack([src_cols.flat, src_rows.flat])[0]
# add sinusoidal oscillation to row coordinates
dst_rows = src[:, 1] - np.cos(np.linspace(0, 3 * np.pi, src.shape[0])) * 50
dst_cols = src[:, 0]
dst_rows *= 1.5
dst_rows -= 1.5 * 50
dst = np.vstack([dst_cols, dst_rows]).T
tform = PiecewiseAffineTransform()
tform.estimate(src, dst)
out_rows = im.shape[0] - 1.5 * 50
out_cols = cols
out = warp(im, tform, output_shape=(out_rows, out_cols))
plt.imshow(out), plt.axis('off')
plt.title('warped', size=15)
plt.show()
Adding Random Gaussian Noise to images¶
We can use the random_noise() function to add different types of noise to an image. The next
code example shows how Gaussian noise with different variances can be added to an image. The next figure shows the output image generated by adding Gaussian noises with
different variance, by running the below code snippet. As can be seen, the more the
standard deviation of the Gaussian noise, the noisier becomes the output image.
im = img_as_float(imread("images/parrot.png"))
plt.figure(figsize=(12,10))
sigmas = [0.1, 0.25, 0.5, 1]
for i in range(4):
noisy = random_noise(im, var=sigmas[i]**2)
plt.subplot(2,2,i+1), plt.imshow(noisy), plt.axis('off')
plt.title('Gaussian noise with sigma=' + str(sigmas[i]), size=20)
plt.tight_layout()
plt.show()
Creating a montage of several multi-channel images¶
The next code example shows how the function montage() can be used to create a montage
of noisy images obtained by using Gaussian noise with different variances to an image.
im = img_as_float(imread("images/parrot.png"))
plt.figure(figsize=(12,12))
sigmas = np.linspace(0, 1, 16)
noisy_images = np.zeros((25, im.shape[0], im.shape[1], im.shape[2]))
for i in range(len(sigmas)):
noisy_images[i,:,:,:] = random_noise(im, var=sigmas[i]**2)
noisy_images_montage = montage(noisy_images, rescale_intensity=True, multichannel=True)
plt.imshow(noisy_images_montage), plt.axis('off')
plt.tight_layout()
plt.show()
Computing the Cumulative Distribution function of an image¶
We can compute the cumulative distribution function (cdf) for a given image with
the cumulative_distribution() function, as shown below. The next figure plots the cumulative distributions for each of the R,G,B channels, as an output of the next code snippet.
im = imread("images/parrot.png")
cdf_r = exposure.cumulative_distribution(im[:,:,0]) # cdf for the red channel
cdf_g = exposure.cumulative_distribution(im[:,:,1]) # cdf for the green channel
cdf_b = exposure.cumulative_distribution(im[:,:,2]) # cdf for the blue channel
plt.figure(figsize=(8,6))
plt.plot(cdf_r[1], cdf_r[0], 'r.-')
plt.plot(cdf_g[1], cdf_g[0], 'g.-')
plt.plot(cdf_b[1], cdf_b[0], 'b.-')
plt.xlabel('pixel value (x)', size=15)
plt.ylabel('cumulate propability P(X <= x)', size=15)
plt.show()
Resizing / rescaling an image¶
Resizing or rescaling an image is a common operation in image processing, and you can achieve this easily using the resize() function from the transform module from the library scikit-image.
- The
resize()function scales the image to the specified size while maintaining its aspect ratio. - The function uses interpolation to estimate pixel values in the resized image. The default interpolation method is
bicubic, but you can change it using the mode parameter if needed. For example, you can usenearestfor nearest-neighbor interpolation. - Rescale operation (using the function
rescale()) resizes an image by a given input scaling factor argument, which can either be a single floating point value, or multiple values - one along each axis. - Resize serves the same purpose, instead of a scaling factor it accepts the desired output image shape.
- Note that when down-sampling an image, resize and rescale should perform Gaussian smoothing to avoid aliasing artifacts, by setting the boolean input argument
anti_aliasingtoTrue. - Aliasing artifact is a phenomenon in signal / image processing where high-frequency components (details) of an image are incorrectly represented at a lower frequency, leading to distortions and visual artifacts. Spatial aliasing occurs in an image when the sampling rate (or resolution) is insufficient to accurately capture the high-frequency information in the signal, manifesting as jagged or moiré patterns, distortion, and other undesirable effects.
- To mitigate aliasing in image processing, anti-aliasing techniques are often employed, which involves various methods to reduce the impact of aliasing, such as filtering and smoothing the image before downsampling or employing techniques like super-sampling.
- Downscale serves the purpose of down-sampling an n-dimensional image (with the function
downscale_local_mean()) by input integer factors argument. using the local mean on the elements of each block of the size factors given as a parameter to the function. - The next code snippet demonsrtates the above function with / without aliasing effect. As expected, the anti-aliased version is smoother image, with much reduced aliasing artifact.
image = imread('images/model.png')[...,:3]
image = image / image.max()
image_rescaled = rescale(image, 0.25, anti_aliasing=False, channel_axis=-1)
image_resized = resize(image, (image.shape[0] // 4, image.shape[1] // 4), anti_aliasing=True)
image_downscaled = downscale_local_mean(image, (4, 3, 1))
print(image.shape, image_rescaled.shape, image_resized.shape, image_downscaled.shape)
plt.figure(figsize=(10,5))
plt.gray()
plt.imshow(np.hstack((image_rescaled, image_resized, image_downscaled))), plt.axis('off')
plt.title('rescaled (aliasing), resized (no aliasing) and downnscaled images', size=10)
plt.show()
(640, 361, 3) (160, 90, 3) (160, 90, 3) (160, 121, 3)
def plot_image(im, title='', fontsize=20):
plt.imshow(im), plt.axis('off'), plt.title(title, size=fontsize)
def plot_images(images, titles, suptitle='', fontsize=20, \
supfontsize=22, figsize=(15,7)):
n = len(images)
plt.figure(figsize=figsize)
plt.gray()
for i in range(n):
plt.subplot(1,n,i+1), plot_image(images[i], titles[i], fontsize)
plt.suptitle(suptitle, size=supfontsize)
plt.tight_layout()
plt.show()
from skimage.util import crop
im = imread('images/monalisa.jpg')
im_cropped = crop(im, ((250, 975), (280, 350), (0,0)), copy=False)
print(im.shape, im_cropped.shape)
# (1280, 846, 3) (30, 171, 3)
plot_images([im, im_cropped], ['original', 'cropped'], figsize=(6,5))
(1280, 846, 3) (55, 216, 3)
Inverting an image¶
We can use simply the invert() function to get the negative of an image, as shown in the
code below. The next figure shows the output negative image generated by running the next line of
code.
im = im / im.max()
np.all(1 - im == invert(im))
Image.fromarray((255*(1-im)).astype(np.uint8))
import cv2
im = cv2.cvtColor(cv2.imread('images/parrot.png'), cv2.COLOR_BGR2RGB)
plt.figure(figsize=(15,7))
plt.imshow(np.hstack((im,
# Flip Image Vertically
cv2.flip(im, flipCode=0),
# Flip Image Horizontally
cv2.flip(im, flipCode=1),
# Flip Image Vertically and Horizontally
cv2.flip(im, flipCode=-1))))
plt.show()
Image Manipulations with PIL / Pillow¶
PIL provides us with many functions to manipulated an image, e.g., using a point
transformation to change the pixel values or to perform geometric transformations on an
image. Let us first start by loading the parrot png image, as shown in the following code. The next few sections describe how to do different types of image manipulations with PIL.
im = Image.open("images/parrot.png") # open the image, provide the correct path
print(im.width, im.height, im.mode, im.getbands(), im.format) # print image size, mode and format
453 340 RGB ('R', 'G', 'B') PNG
x = im.load()
x[0,0]
(81, 116, 94)
print(im.getdata()[0])
print(im.convert('L').getdata()[0])
print(im.convert('F').getdata()[0])
(81, 116, 94) 103 103.0270004272461
im_data = im.getdata()
print(type(im_data))
print(im_data.size)
print(im.convert('L').getdata().getextrema())
<class 'ImagingCore'> (225, 225) (1, 235)
Cropping an image¶
We can use the crop() function with the desired rectangle argument to extract the
corresponding area from the image, as shown in the following code.
im = Image.open("images/parrot.png")
im_c = im.crop((175,75,320,200)) # crop the rectangle given by (left, top, right, bottom) from the image
im_c.save("images/parrot_cropped.jpg")
im_c.show()
The next figure displays the image saved in jpg format using the above code.
im_c
Resizing an image¶
In order to increase or decrease the size of an image we can use the resize() function that internally up-samples or down-samples the image, respectively.
Resizing to a larger image¶
Let us start with a small clock image of size 149 x 97 and create a larger size image. The next code snippet shows the small clock image to start from.
im = Image.open("images/bricks_small.jpeg")
print(im.width, im.height)
# 107 105
im.show()
150 100
The output of the above code, the small clock image is shown below.

The next line of code shows how the resize() function can be used to enlarge the above input clock image (by
a factor of 5) to obtain an output image of size 25 times larger than the input image, using bi-linear
interpolation (an up-sampling technique), the details about how this technique works will be described in the
second chapter.
im = Image.open("images/bricks_small.jpeg")
print(im.width, im.height)
# 150 100
im.resize((im.width*5, im.height*5), Image.BILINEAR) \
.save('images/bricks_large.jpeg')
im_large = Image.open("images/bricks_large.jpeg")
print(im_large.width, im_large.height)
# 750 500
im_concat = Image.new('RGB', (2*im_large.width, im_large.height))
im_concat.paste(im, (0,0))
im_concat.paste(im_large, (im_large.width,0))
im_concat #.show()
150 100 750 500
Resizing to a smaller image¶
Now let us do the reverse, i.e. start with a large image of the Victoria Memorial Hall (of size 720 x 540) and create a smaller-sized image. The next code snippet shows the large image to start from.
im = Image.open("images/victoria_memorial.png")
im.resize((im.width//5, im.height//5), Image.Resampling.LANCZOS).save('images/victoria_memorial_small.png')
im_small = Image.open("images/victoria_memorial_small.png")
im_concat = Image.new('RGB', (2*im.width, im.height))
im_concat.paste(im, (0,0))
im_concat.paste(im_small, (im.width,0))
im_concat #.show()
The next line of code shows how the function resize() can be used to shrink the above image
of the Victoria Memorial Hall (by a factor of 5) to resize it to an output image of size 25
times smaller than the input image, using anti-aliasing (a high-quality down-sampling
technique), the details how it works will be described in the second chapter.
Pixelating an image¶
Pixelation refers to the visual effect of making an image appear blurry or unclear by replacing fine details with larger, square-shaped blocks or pixels. This technique is often used to obscure or hide sensitive information, such as faces or other identifying features, in images or videos. Pixelation is a form of image processing that reduces the level of detail in a specific region of an image.
The process of pixelation involves dividing the image into small, discrete square areas and replacing the pixel values within those areas with an average or representative color. This results in a blocky appearance that makes it difficult to discern the finer details of the original image. The extent of pixelation, or the size of the pixel blocks used, can be adjusted to achieve varying levels of obscuration.
Here's a basic overview of how pixelation is done:
- Determine the Region to Pixelate
- Define Pixel Block Size
- Replace Pixel Values
- Apply the Pixelation Effect
Let's first implement the function pixelate(), as shown in the next image, it applies pixelation to the entire input image
- first by reducing the size of the image by the given input
factorusing Bilinear interpolation and - then resizing the image using Nearest neigbor interpolation.
The next code snippet shows how the pixelation effect is applied to the input parrot image, with different values of factor (e.g., 16 and 32).
def pixelate(im, factor):
# Resize down smoothly
im_small = im.resize((im.width//factor, im.height//factor), resample=Image.BILINEAR)
# Scale back up using NEAREST to original size
return im_small.resize(im.size, Image.NEAREST)
im = Image.open("images/parrot.png")
plt.figure(figsize=(15,6))
plt.imshow(np.hstack([np.array(im)] + [np.array(pixelate(im, factor)) for factor in [16,32]])), plt.axis('off')
plt.title('Pixelating by downsampling to factor of 16 and 32', size=20)
plt.show()
Pixelating a region (face) of an image¶
Now, let's apply the pixelation to a part of the image, for example, the face region in the input lena image, where we want to hide the details. The pixelation effect can be applied to the refion of interest (ROI) using numpy slicing, as shown in the next code snippet and shown in the following figure.
im = Image.open("images/monalisa.jpg")
im_p = pixelate(im, 24)
x, y, h, w = 310, 240, 180, 170 #105, 105, 55, 45 # specify rect region
out = np.array(im)
out[y:y+h, x:x+w] = np.array(im_p)[y:y+h, x:x+w]
plt.imshow(np.hstack((np.array(im), out))), plt.axis('off')
plt.show()
Inverting, Solarizing and Posterizing an Image¶
We can use the point() function from PIL.Image to transform each pixel value with a single-argument
function. We can use it to negate an image as shown in the next code block. The pixel
values are represented using 1-byte unsigned integers, that is why subtracting it from the
maximum possible value will be the exact point operation required on each pixel to get the
inverted image.
Inverting / Negating an image¶
Inverting or negating an image involves reversing the colors of the pixels, turning light areas dark and vice versa. The inversion is typically performed by subtracting the pixel values from the maximum intensity value (e.g., 255). To invert a color image, each color channel is inverted independently. Let's demonstrate how the inversion operation can be implemented with the Image.point() function in the next code snippet.
im = Image.open("images/monalisa.jpg")
im_t = im.point(lambda x: 255 - x)
im_t.show()
Solarizing / Posterizing an image¶
Solarize and posterize operations are specific types of manipulation that can be applied to an image to achieve different visual effects, as discussed below.
Solarize: with this technique the tones in the image are reversed, resulting in a partial or complete reversal of the usual dark and light areas. In other words, the darkest areas become light, and the lightest areas become dark. Solarizing an image can create a surreal and high-contrast look, often with a combination of positive and negative tones.
Posterize: this technique reduces the number of tones in an image to a small, fixed number (by reducing the number of bits to store). This reduction in tones results in distinct, flat areas of color or brightness in the image, giving it a stylized and simplified appearance. Posterizing an image can create a graphic, stylized effect with clearly defined areas of color or brightness. It's called posterize because the technique is often used in poster art.
Now, let's demonstrate how the solarize() and posterize() functions from PIL.ImageOps can be used to solarize and posterize the color channels of an input image, respectively, using the following code snippet. Also, let's use the invert() method from the same module to invert the input image and the effect_spread() functions from the Image module to randomly spread pixels in the input image, the function accepts a parameter distance, which is the distance to spread pixels.
im = Image.open("images/pepper.webp")
im_in, im_sol, im_post = ImageOps.invert(im), ImageOps.solarize(im), ImageOps.posterize(im, 2)
im_out = Image.new('RGB', (5*im.width, im.height))
im_out.paste(im, (0,0))
im_out.paste(im_in, (im.width,0))
im_out.paste(im_sol, (2*im.width,0))
im_out.paste(im_post, (3*im.width,0))
im_out.paste(im.effect_spread(30), (4*im.width,0))
plt.figure(figsize=(20,7))
plt.imshow(im_out), plt.axis('off')
plt.title('The original, inverted, solarized, posterized and effect_spread images with PIL', size=20)
plt.show()
Converting an image to gray scale¶
We can use the convert() function from PIL.Image with parameter 'L' to change a RGB color image to a grayscale
image as shown in the following code.
im = Image.open("images/parrot.jpg")
im_g = im.convert('L')
im_g
You can use the methods getpixel() / putpixel() too (along with the function to_gray()), to convert an RGB to a grasyscale image.
def to_gray(r, g, b):
return 0.2989 * r + 0.5870 * g + 0.1140 * b
w, h = im.size
for i in range(w):
for j in range(h):
r, g, b = im.getpixel((i, j))
gr = int(to_gray(r, g, b))
im.putpixel((i, j), (gr, gr, gr))
im
The next figure shows the gray-level image, the output of the above code. We are going to use the grayscale image obtained for the next few gray-level transformations.
A few gray level point transformations¶
Here we explore a couple of transformations that transforms each single pixel value from
input image using a function to a corresponding pixel value for the output image, the
function point() can be used for this. Each pixel has a value in between 0 to 255, inclusive.
Log transformation¶
The log transformation can be used to compress an image that has a dynamic range of pixel values effectively. The next code uses the point transformation for logarithmic transformation. As can be seen, the range of pixel values are narrowed, the brighter pixels from the input image have become darker and the darker pixels have become brighter, thereby shrinking the range of values of the pixels.
im_g.point(lambda x: 255*np.log(1+x/255)).show()
Power-law transformation¶
This transformation is used as γ correction for an image. The next line of code shows how to
use the point() function for a power-law transformation with $γ = 0.6$.
im_g.point(lambda x: 255*(x/255)**0.6).show()
The next figure shows the output log and power-law-transformed images produced from the gray-scale image by running the above lines of code (use matplotlib.subplot() to put them together).

We can use the point() function to apply arbitray transformation to each pixel (in all bands) for an RGB image too, as done with the next code snippet. The point() function can also accept a lookup table (LUT) and transform a grayscale image using the LUT (the next code snippet circular-shifts the gray level values with np.roll()).
Like the Image.point() method, the ImagingCore.point_transform() method can also be used for gamma function point transform instead (using the scale and offset arguments, but need to transform the image mode to 'F'), as shown below.
im1 = im.point(lambda x: x*np.tan(x))
im2 = im.point(lambda x: 255 if x > 128 else 0)
im3 = im.convert('L').point(np.roll(np.arange(256), 115))
im4 = im.copy()
im4.im = im4.convert('F').getdata().point_transform(10, 15)
plt.figure(figsize=(15,7))
plt.imshow(np.hstack((im1, im2, im3.convert('RGB'), im4))), plt.axis('off')
plt.title('point transforms with PIL Image and ImagingCore', size=15)
plt.show()
A few Geometric Transformations¶
Geometric image transformations involve manipulating the spatial relationships and orientation of images. These transformations can include rotation, scaling, translation, and other operations that alter the geometry of the image. Scikit-image is a Python library that provides various image processing tools, including functions for geometric transformations. The skimage.transform module in scikit-image is particularly useful for performing these operations.
Here's a brief overview of Geometric Image Transformations:
- Translation: Shifting the image along the x and y axes.
- Rotation: Rotating the image around a specified point.
- Scaling: Resizing the image by a specified factor.
- Shearing: Distorting the image by changing the angle of different axes.
- Affine: Combining translation, rotation, scaling, and shearing.
- Projective (Homography): Generalized transformation for perspective changes.
In this section we shall discuss about a set of geometric transformations that are done by multiplying appropriate matrices (often expressed in homogeneous coordinates) with the image matrix. These transformations change the geometric orientation of an image, hence the name.
Reflecting an image¶
We can use the transpose() function to reflect an image w.r.t. the horizontal or
the vertical axis. (You can use the mirror() function from PIL.ImageOps module too)
im.transpose(Image.FLIP_LEFT_RIGHT).show() # reflect about the vertical axis
Rotating an image¶
We can use the rotate() function to rotate an image by an angle (in degrees).
im_45 = im.rotate(45) # rotate the image by 45 degrees
im_45.show() # show the rotated image
The next figure shows the rotated output images produced by applying reflection and rotation by running the above lines of code.

Applying an Affine transformation on an image¶
The following figure shows the 2D affine transformation matrix T to be applied on each pixel of an image (in homogeneous coordinates) to undergo an affine transformation. Different values of the constants in T changes the transformation to rotation, translation, scaling, shear etc. affine transformations or a combination of some / all of them. An interested reader is again advised to refer to this article to understand how these transformations can be implemented (from scratch).

For example, in order to rotate an image properly the image matrix must be multiplied by a series of matrices as shown below:

An affine transformation sends each pixel $f(x,y)$ from the input image to its corresponding location $(x', y') = T(x, y)$ in the output image. But what if a transformed pixel coordinates lie in between two pixels in the output image? This problem is often tackled with inverse mapping (warping). For each pixel at location $(x', y')$ in the output image, the value is obtained from the pixel value in the input image at its corresponding location $(x, y) = T^{-1}(x', y')$. If the pixel in the input image comes from “between” two pixels, the pixel value is computed using interpolated (e.g., with bi-linear interpolation) pixel values from the neighbors. The next figure describes the concept.

The following code shows the output image obtained when the input image is transformed
with a shear transform matrix. The data argument in the transform() function is a 6-tuple $(a,
b, c, d, e, f)$ which contain the first two rows from an affine transform matrix. For each pixel $(x,
y)$ in the output image, the new value is taken from a position $(a x + b y + c, d x + e y + f)$ in
the input image, rounded to nearest pixel. The transform() function can be used to scale,
translate, rotate, and shear the original image.
im = Image.open("images/parrot.png")
im.transform((int(1.4*im.width), im.height), Image.AFFINE, data=(1,-0.5,0,0,1,0)).show() # shear
Applying Perspective Transformation¶
Projective traformation (also called perspective transformation in the context of image processing and computer vision) is often used to correct or simulate perspective distortions in images. We can run a perspective transformation on an image with the transform() method using Image.PERSPECTIVE argument, as shown in the next code block.
params = [1, 0.1, 0, -0.1, 0.5, 0, -0.005, -0.001]
im1 = im.transform((im.width//3, im.height), Image.PERSPECTIVE, params, Image.BICUBIC)
im1.show()
The next figure shows the images obtained after applying the shear and the perspective projection, respectively, by running the above code blocks.

Changing pixel values of an image¶
We can use the putpixel() function to change a pixel value in an image. Next, let us discuss a
popular application of adding noise to an image using the function.
Adding salt and pepper noise to an image¶
We can add some salt-and-pepper noise to an image by selecting a few pixels from the image randomly and then setting the about half of those pixel values to black and another half to white. The next code snippet shows how to add the noise.
# choose 5000 random locations inside image
im1 = im.copy() # keep the original image, create a copy
n = 5000
x, y = np.random.randint(0, im.width, n), np.random.randint(0, im.height, n)
for (x,y) in zip(x,y):
im1.putpixel((x, y), ((0,0,0) if np.random.rand() < 0.5 else (255,255,255))) # salt-and-pepper noise
im1.show()
The following figure shows the output noisy image generated by running the above code snippet.

Drawing on an image¶
We can draw lines or other geometric shapes on an image (e.g., the ellipse() function to draw
an ellipse) from the PIL.ImageDraw module, as shown in the next python code fragment.
im = Image.open("images/parrot.png")
draw = ImageDraw.Draw(im)
draw.ellipse((125, 125, 200, 250), fill=(255,255,255,128))
del draw
im.show()

Drawing Bonding Boxes¶
The method Image.getbbox() calculates the bounding box of the non-zero regions in the input image, as demonstrated by the next code snippet.
with Image.open("images/blocks.png") as im:
x, y, w, h = im.getbbox()
draw = ImageDraw.Draw(im, "RGB")
draw.rectangle(((x, y), (w, h)), fill=None)
plt.imshow(im)
Drawing text on an image¶
We can draw texts on an image using the text() function from PIL.ImageDraw module, as
shown in the next python code fragment.
draw = ImageDraw.Draw(im)
font = ImageFont.truetype("arial.ttf", 23) # use a truetype font
draw.text((10, 5), "Welcome to image processing with python", font=font)
del draw
im.show()
The following figure shows the output images generated by running the above code snippets.

Pasting an image on top of another¶
We can use the paste() function to paste an image on top of another, as shown below. The following figure shows the output image generated by running the code snippet.
im = Image.open("images/parrot.png")
for i in range(5):
im_small = im.resize((100,100))
im.paste(im_small, (25,25))
plt.figure(figsize=(10,7))
plt.imshow(im), plt.axis('off')
plt.show()
Creating a Thumbnail¶
We can create the same image as in the last example but this time creating a thumbnail
from the image with the thumbnail() function, as shown in the next code block.
im = Image.open("images/parrot.png")
im_thumbnail = im.copy() # need to copy the image first
im_thumbnail.thumbnail((100,100))
im.paste(im_thumbnail, (10,10))
im.show()
The following figure shows the output image generated by running the above code snippet.
![]()
Computing the basic statistics of an image¶
We can use the stat module to compute the basic statistics (mean, median, standard deviation of pixel values of different channels etc.) of an image, as shown in the next code block.
s = stat.Stat(im)
print(s.extrema) # maximum and minumum pixel values for each channel R, G, B
print(s.count)
print(s.mean)
print(s.median)
print(s.stddev)
[(4, 255), (0, 255), (0, 253)] [154020, 154020, 154020] [127.62241916634203, 124.89540319439034, 67.81682249058564] [121, 130, 62] [47.486354000919064, 52.13537728737721, 39.689444601928486]
Plotting the histograms of pixel values for the RGB channels of an image¶
The function histogram() can be used to compute the histogram (a table of pixel values vs.
frequencies) of pixels for each channel and returns the concatenated output (e.g., for an
RGB image, the output contains 3 x 256 =768 values). The following figure shows the R,G,B color histograms plotted by running the code snippet.
pl = im.histogram()
plt.bar(range(256), pl[:256], color='r', alpha=0.5)
plt.bar(range(256), pl[256:2*256], color='g', alpha=0.4)
plt.bar(range(256), pl[2*256:], color='b', alpha=0.3)
plt.show()
Separating the RGB channels of an image¶
We can use the split() function to separate the channels of a multi-channel image, as is
shown in the following code for an RGB image.
ch_r, ch_g, ch_b = im.split() # split the RGB image into 3 channels: R, G and B
# we shall use matplotlib to display the channels
plt.figure(figsize=(18,6))
plt.subplot(1,3,1); plt.imshow(ch_r, cmap=plt.cm.Reds); plt.axis('off')
plt.subplot(1,3,2); plt.imshow(ch_g, cmap=plt.cm.Greens); plt.axis('off')
plt.subplot(1,3,3); plt.imshow(ch_b, cmap=plt.cm.Blues); plt.axis('off')
plt.tight_layout()
plt.show() # show the R, G, B channels
The above figure shows 3 output images created for each of the R,G,B channels generated by running the above code snippet.
Combining multiple channels of an image¶
We can use the merge() function to combine the channels of a multi-channel image, as is shown in the
following code, where the color channels obtained by splitting the parrot RGB image is merged after swapping
the red and blue channels.
im = Image.merge('RGB', (ch_b, ch_g, ch_r)) # swap the red and blue channels obtained last time with split()
im.show()
The following figure shows the RGB output image created by merging the B,G,R channels, by running the above code snippet.

The alpha (α) channel and blending image¶
The alpha channel in an image is an additional channel that represents the transparency or opacity of each pixel. It is often used in images with varying levels of transparency, allowing for smooth blending of the image with a background or other images.
The alpha channel is an integral part of the RGBA color model, where each pixel is represented by four components: Red, Green, Blue, and Alpha. The PNG image format supports alpha channels, making it a popular choice for images that require transparency. JPEG, on the other hand, does not support alpha channels.
The alpha channel allows images to have pixel-level transparency, making it possible to create smooth transitions between the image and a background.
An alpha value of 0 indicates complete transparency, meaning the pixel is fully see-through. An alpha value of 255 (or 1.0) indicates full opacity, making the pixel fully visible.
Convert a jpg image to a transparent png image¶
To convert a JPEG image to a transparent PNG PIL, you can follow these steps:
- Open the JPEG image using
Image.open(). - Create a new RGBA image using
convert("RGBA")to add analphachannel. - Set the alpha channel based on a condition (e.g., transparency threshold based on the RGB values).
- Save the resulting image as a PNG using
Image.save().
The next code snippet reads the parrot JPG image, extracts the regions conrreponding to the red feathers and saves as a trasparent PNG, as shown in the next figure.
import numpy as np
from PIL import Image
img = Image.open('images/parrot.jpg') # n x m x 3
imga = img.convert("RGBA") # n x m x 4
imga = np.asarray(imga)
r, g, b, a = np.rollaxis(imga, axis=-1) # split into 4 n x m arrays
r_m = r > 100 # binary mask for red channel, True for read feathers
g_m = g < 100 # binary mask for green channel, True for read feathers
b_m = b < 200 # binary mask for blue channel, True for read feathers
# combine the three masks using the binary "or" operation
# multiply the combined binary mask with the alpha channel
a = a * ((r_m == 1) & (g_m == 1) & (b_m == 1))
# stack the img back together
imga = Image.fromarray(np.dstack([r, g, b, a]), 'RGBA')
imga.save('images/parrot_transparent.png')
print(img.mode, imga.mode)
RGB RGBA

α-blending two images¶
The function PIL.Image.blend() can be used to create a new image by interpolating between two given
images (of same size), using a constant α. Both images must have the same size and mode.
The output image is given by $out = image_1. (1.0 - α) + image_2. α$
If α is 0.0, a copy of the first image is returned. If α is 1.0, a copy of the second image is
returned. The next code snippet shows an example.
im1 = Image.open("images/parrot.png")
im2 = Image.open("images/hill.png")
# 453 340 1280 960 RGB RGBA
#im1 = im1.convert('RGBA') # if two images have different modes, must be converted to the same mode
im2 = im2.resize((im1.width, im1.height), Image.BILINEAR) # two images have diffrent sizes, must be converted to the same size
im = Image.blend(im1, im2, alpha=0.5).show()
The following figure shows the output image generated by blending the above two images by running the above code snippet.

α-compositing two images¶
The method Image.alpha_composite(a,b) creates a composite image $a$ over $b$. The equation for alpha value and pixel value of the generated images are:

where $\alpha_a$, $\alpha_b$ represent the transparency values and $v_a$, $v_b$ represent the pixel values of the images $a$ and $b$ (RGBA images), respectively. The next code snippet uses this function to obtain a composite of two images, it converts the input RGB images to RGBA image with the method Image.putalpha().
fish = Image.open('images/fish.png')
sea = Image.open('images/sea.png')
fish.putalpha(100)
sea.putalpha(200)
alpha_comp = Image.alpha_composite(sea, fish)
plt.figure(figsize=(15,5))
plt.imshow(np.hstack((np.array(fish), np.array(sea), np.array(alpha_comp)))), plt.axis('off')
plt.title('alpha composite of the fish and sea images', size=20)
plt.show()
Adding watermark to an image¶
A watermark is a recognizable image or pattern that is superimposed onto another image, typically for branding or copyright purposes. Watermarks are often semi-transparent and placed in a corner or across the center of an image to discourage unauthorized use or to identify the creator of the image.
The next code snippet adds Packt logo image to the parrot cover image, using the following steps.
- First both the original input (cover) and watermark images are converted to
RGBAmode. - The watermark image is resized and pasted on a blank image, subsequently a semitransparent alpha layer (with $\alpha=128$).
- Finally, the function
alpha_composite()is used to combine the alpha channels of the original and the watermark images. The watermarked output produced is shown in the next figure.
cover_image = Image.open('images/parrot.jpg').convert('RGBA')
watermark = Image.open('images/packt.png').resize((100,50)).convert('RGBA')
layer = Image.new('RGBA', cover_image.size, (0, 0, 0, 0))
layer.paste(watermark, (325, 225))
# Create a copy of the layer
layer2 = layer.copy()
# Put alpha on the copy
layer2.putalpha(128)
# merge layers with mask
layer.paste(layer2, layer)
watermarked_image = Image.alpha_composite(image, layer)
watermarked_image.convert('RGB')
Blending two images with a mask¶
You can use the Image.composite() function in the Python Imaging Library (PIL) to blend two images with a tranparency mask.
The
Image.composite()function takes three images as input arguments: foreground, background, and mask (optional).The (optional) mask image is a grayscale image with the same size as the other two images. The pixel values in the mask control the transparency of the corresponding pixels in the result.
The function returns a new image that is the result of the compositing operation.
In this example, we shall create a composite image by compositing a butterfly image with its negative (i.e., inverted) image, using a horizotal intensity gradient mask
grad_mask, as shown in the next code snippet, where the pixel values range from 0 (fully transparent) to 255 (fully opaque).As can be seen from the next figure, the output consists of the left part of the inverted image and right part of the original image (since according to the transparency gradient mask, the original image gets transparent in the left side and opaque in the right side).
from PIL import Image, ImageOps
# Load an image and invert
im = Image.open('images/butterfly.png')
w, h = im.size
im_inv = ImageOps.invert(im)
# Create an intensity gradient mask
grad_mask = np.linspace(0, 1, w)
grad_mask = Image.fromarray((255*np.tile(grad_mask, (h, 1))).astype(np.uint8))
# Composite the image with its inverted version using the gradient mask
im_out = Image.composite(im, im_inv, grad_mask)
print(im.mode, im_inv.mode, grad_mask.mode, im_out.mode)
plt.figure(figsize=(18,12))
plt.gray()
plt.subplot(221), plt.imshow(im), plt.title('original', size=20), plt.axis('off')
plt.subplot(222), plt.imshow(im_inv), plt.title('inverted', size=20), plt.axis('off')
plt.subplot(223), plt.imshow(grad_mask), plt.title('intensity gradient mask', size=20), plt.axis('off')
plt.subplot(224), plt.imshow(im_out), plt.title('composite image', size=20), plt.axis('off')
plt.tight_layout()
plt.show()
RGB RGB L RGB
Now, use just numpy arrays (don't use PIL methods) to compute a composite image, this is left as an exercise to the reader (should not need more than three lines of code).
Adding and differencing two images¶
The next code snippet shows how an image can be generated by adding two input images (of same size) pixel by pixel using the function PIL.ImageChops.add().
im1 = Image.open("images/parrot.png")
im2 = Image.open("images/hill.png").convert('RGB').resize((im1.width,im1.height))
add(im1, im2).show()
The next line of code uses the function PIL.ImageChops.difference() to obtain the absolute value of the pixel-by-pixel difference between the images.
difference(im1, im2).show()
The following figure shows the output images generated by running the above code snippets, along with the input images.

Change detection¶
Image difference can be used for to detect changes in between two images. For example, the next code block shows how to compute the difference image from two successive frames from a video recording (from youtube) of a match in football world cup 2018. The next figure shows the output of the code snippet, the consecutive frame images followed by their difference
def diff_numpy(im1, im2): # assumes images have dtype np.uint8, pixel values in between [0,255]
return np.clip(np.abs(color.rgb2gray(np.array(im1)) - color.rgb2gray(np.array(im2))), 0, 255)
def brighten(im):
return np.clip((255*im/im.max()).astype(np.uint8) + 50, 0, 255)
im1 = Image.open("images/goal1.png")
im2 = Image.open("images/goal2.png")
#im = screen(im1, im2)
im = difference(im1, im2)
im.save("images/goal_diff.png")
plt.figure(figsize=(15,9))
plt.subplot(221), plt.imshow(im1), plt.axis('off'), plt.title('frame 1', size=20)
plt.subplot(222), plt.imshow(im2), plt.axis('off'), plt.title('frame 2', size=20)
plt.subplot(223), plt.imshow(im), plt.axis('off'), plt.title('difference (PIL)', size=20)
plt.subplot(224), plt.imshow(brighten(diff_numpy(im1, im2)), cmap='gray'), plt.axis('off'), plt.title('difference (numpy)', size=20)
plt.tight_layout()
plt.show()
Subtracting two images¶
The next code snippet shows how to first subtract two images, followed by dividing the result by scale
and adding the offset provided as argument to the function PIL.ImageChops.subtract(). The scale defaults to 1.0, and offset to 0.0. The following figure shows the output image generated by running the blow code snippet. Note that unlike difference(), the two input images can have different size.
im1 = Image.open("images/parrot.png").convert('RGB')
im2 = Image.open("images/hill.png").convert('RGB')
im = subtract(im1, im2, scale=1.0, offset=0)
print(im1.size, im2.size, im.size)
im.save("images/parrot_hill_subtract.jpg")
#im.show()
plt.figure(figsize=(12,5))
plt.imshow(np.hstack((im, np.array(difference(im, im1))))), plt.axis('off')
plt.title('subtract images and take difference of the output from the first image', size=15)
plt.show()
(453, 340) (1280, 960) (453, 340)
Superimposing two images with soft_light, hard_light, overlay algorithms¶
Let's now demonstrate how to superimpose images using a few algorithms often specified as blending modes. These are mathematical algorithms that dictate how pixels from one image interact with pixels from another image when the two images are combined. These modes control the way pixel values are combined to create a new image. Here are explanations for some common blending modes and the output image results obtained using different blending equations:
Normal (or Over): $\text{result} = \alpha \times \text{foreground} + (1 - \alpha) \times \text{background}$, where α is the transparency value of the foreground pixel, ranging from 0 (completely transparent) to 1 (completely opaque). The resulting pixel is a linear combination of the foreground and background pixels based on their transparency.
Hard Light: $result = \begin{cases} 2 \times \text{background} \times \text{foreground}, & \text{if foreground } \leq 0.5 \\ 1 - 2 \times (1 - \text{background}) \times (1 - \text{foreground}), & \text{otherwise } \end{cases}$
This mode simulates shining a harsh spotlight on the image. It produces strong contrast and vivid colors.
- Soft Light: $\text{result} = 1 - 2 \times (1 - \text{background}) * (1 - \text{foreground})$
This mode produces a subtle blending of the two images, creating a soft and diffused effect.
- Overlay:
$result = \begin{cases} 2 \times \text{background} \times \text{foreground}, & \text{if background } \leq 0.5 \\ 1 - 2 \times (1 - \text{background}) \times (1 - \text{foreground}), & \text{otherwise } \end{cases}$
This mode enhances the contrast and saturation of the image. It combines the effects of both multiply and screen blending.
These algorithms manipulate pixel values based on the characteristics of the blending mode. The specifics of each algorithm determine how pixels interact, and the choice of blending mode can significantly impact the visual appearance of the resulting image. Keep in mind that these explanations are simplified, and the actual implementations might involve additional considerations for different color channels and color spaces.
with PIL¶
The functions hard_light(), soft_light() and overlay() from the module PIL.ImageChopscan be used to superimpose two images using the above algorithms, as demonstrated in the next code snippet and the outputs obtained are shown in the next figure.
im1 = Image.open("images/scene.jpg")
im2 = Image.open("images/turtle.jpg").convert('RGB').resize((im1.width, im1.height))
im_hard = hard_light(im1, im2)
im_soft = soft_light(im1, im2)
im_overlay = overlay(im1, im2)
plt.figure(figsize=(20,7))
plt.imshow(np.hstack((im_hard, im_soft, im_overlay))), plt.axis('off')
plt.title('blending with hard-light, soft-light and overlay algorithms', size=20)
plt.show()
with blend-mode¶
#opacity = 0.7 # The opacity of the foreground that is blended onto the background is 70 %.
plt.figure(figsize=(15,5))
i = 1
for opacity in [0.3, 0.5, 0.8]:
blended_img_float = blend_modes.soft_light(imgviz.rgb2rgba(np.array(im1)).astype(float),
imgviz.rgb2rgba(np.array(im2)).astype(float), opacity)
plt.subplot(1,3,i), plt.imshow(np.uint8(blended_img_float)), plt.axis('off'), plt.title(f'opacity={opacity}', size=20)
i += 1
plt.suptitle('Blending with soft-light algorithm with blend-mode', size=25)
plt.tight_layout()
plt.show()

Superimposing two image negatives¶
The next code block shows how to superimpose two inverted images on top of each other
using the screen() function.
im1 = Image.open("images/parrot.png")
im2 = Image.open("images/hill.png").convert('RGB').resize((im1.width, im1.height))
screen(im1, im2).show()
Getting darker / lighter of two images¶
Compares the two images, pixel by pixel, and returns a new image containing the lighter values, using the function PIL.ImageChops.lighter().
im1 = Image.open("images/parrot.png")
im2 = Image.open("images/hill.png").convert('RGB').resize((im1.width, im1.height))
lighter(im1, im2).show()
The following figure shows the output images generated by running the above code snippets.

Changing image quality / resolution¶
im = Image.open('images/parrot.png')
qualities = [95, 20, 5, 1]
for quality in qualities:
im.save(f"images/parrot_{quality}.jpg", quality=quality)
plt.figure(figsize=(15,6))
plt.imshow(np.hstack([imread(f"images/parrot_{quality}.jpg") for quality in qualities])), plt.axis('off')
plt.title('Changing image quality with PIL (with qualities: {})'.format(qualities), size=20)
plt.show()
Deforming an Image¶
# https://pythoninformer.com/python-libraries/pillow/imageops-deforming/
class WaveDeformer:
def transform(self, x, y):
y = y + 20*np.sin(x/15)
return x, y
def transform_rectangle(self, x0, y0, x1, y1):
return (*self.transform(x0, y0),
*self.transform(x0, y1),
*self.transform(x1, y1),
*self.transform(x1, y0),
)
def getmesh(self, img):
self.w, self.h = img.size
gridspace = 20
target_grid = []
for x in range(0, self.w, gridspace):
for y in range(0, self.h, gridspace):
target_grid.append((x, y, x + gridspace, y + gridspace))
source_grid = [self.transform_rectangle(*rect) for rect in target_grid]
return [t for t in zip(target_grid, source_grid)]
im = Image.open('images/lena.jpg')
im_deformed = ImageOps.deform(im, WaveDeformer())
plt.figure(figsize=(12,5))
plt.imshow(np.hstack((np.array(im), np.array(im_deformed)))), plt.axis('off')
plt.title('The original and the deformed image', size=20)
plt.show()
Colorize a grayscale image¶
im = Image.open('images/girl.png').convert('L')
im_colorized = ImageOps.colorize(im, black=(255,0,0), white=(255,255,127)) #, mid=(255,0,255))
plt.figure(figsize=(12,5))
plt.imshow(np.hstack((color.gray2rgb(np.array(im)), np.array(im_colorized)))), plt.axis('off')
plt.title('The original and the colorized image', size=20)
plt.show()
Changing image colors from LUT files¶
lut = load_cube_file("images/Bourbon 64.cube") #NightFromDay.CUBE")
im = Image.open("images/day.png")
plt.figure(figsize=(10,6))
im_in = np.array(im)
im_out = np.array(im.filter(lut))
plt.imshow(np.hstack((im_in[:,:im_in.shape[1]//2], im_out[:,im_out.shape[1]//2:]))), plt.axis('off')
plt.show()
γ-Correction with LUT¶
You can use a look-up table (LUT) to implement gamma correction. A lookup table is essentially an array or mapping that defines how each pixel value in the input image should be transformed to obtain a corresponding output value.
with PIL¶
The class PIL.ImageFilter.Color3DLUT can be used as a 3-D color lookup table. It transforms 3-channel pixels using the values of the channels as coordinates in the 3D lookup table and interpolating the nearest elements. It allows to apply almost any color transformation in constant time by using pre-calculated decimated tables.
- The function
ImageFilter.Color3DLUT.generate()can be used to generate the LUT of a given size (e.g., a $11\times 11\times 11$ LUT is used in the next code snippet) using thelambdafunction as callback as arguments to the function - Next the function
Image.filter()can be used to perform the γ-correction of the input image with the LUT passed as a parameter, (there is no need to iterate over pixels), as shown in the code snippet below, the next figure shows the output.
γ = 0.4
img = Image.open('images/dark_house.png')
LUT = ImageFilter.Color3DLUT.generate((11,11,11), lambda r, g, b: (r**γ, g**γ, b**γ))
img_out = img.filter(LUT)
plt.figure(figsize=(15,6))
plt.imshow(np.hstack((np.array(img), np.array(img_out)))), plt.axis('off')
plt.title('original and γ-corrected image', size=20)
plt.show()
You can implement γ-Correction just using point() function instead (with a single line of code), this is left as an exercise.
with opencv-python¶
You can implement the same with the function γ_correct() using opencv-python's cv2.LUT() function to get similar output, as shown in the next code snippet.
def γ_correct(img, γ):
table = [((i / 255) ** γ) * 255 for i in range(256)]
table = np.array(table, np.uint8)
return cv2.LUT(img, table)
img = np.array(img)
plt.figure(figsize=(15,6))
plt.imshow(np.hstack((img, γ_correct(img, γ)))), plt.axis('off')
plt.title('original and γ-corrected image', size=20)
plt.show()
Simulating Lens Blur Effects¶
from PIL import ImageFilter
image = Image.open('images/me8.jpg').convert('RGB')
mask = Image.open('images/mask.png').convert('L')
p = 0.5
r = 5
gammaCorrectedImage = image.point(lambda x: x ** p) ## Or whatever power works for you.
bokeh = gammaCorrectedImage.filter(ImageFilter.GaussianBlur(radius=r)) #ImageFilter.BLUR)
bokeh = bokeh.point(lambda x: x ** (1/p)) # To get back the original gamma.
bokeh = Image.composite(image, bokeh, mask)
blurImage = image.filter(ImageFilter.GaussianBlur(radius=r)) #ImageFilter.BLUR) # #
blurImage = Image.composite(image, blurImage, mask)
finalImage = lighter(bokeh, blurImage)
finalImage.show()

Radial Blur¶
Radial blur is a photographic and image processing technique where the center of an image is in focus while the surrounding areas appear blurred in a radial pattern. This effect is often used to convey a sense of motion or to draw attention to a specific point in the image.
# Define width and height of image
W, H = 650, 650
# Create solid red image
im = Image.new(mode='RGB', size=(W,H), color=(255,255,255))
# Create radial alpha/transparency layer. 255 in centre, 0 at edge
Y = np.linspace(-1, 1, H)[None, :]*255
X = np.linspace(-1, 1, W)[:, None]*255
alpha = np.sqrt(X**2 + Y**2)
alpha = 255 - np.clip(0,255,alpha)
# Push that radial gradient transparency onto red image and save
im.putalpha(Image.fromarray(alpha.astype(np.uint8)))
im1 = Image.open('images/me8.jpg').convert('RGBA')
im2 = Image.alpha_composite(im1, im.resize(im1.size))
im2.show()

Interchanging Image Color Palettes¶
import os
import warnings
warnings.filterwarnings('ignore')
img1, img2 = Image.open('images/parrot.jpg'), Image.open('images/lena.jpg')
img1_p = img.convert("P", palette=Image.ADAPTIVE, colors=16)
img1_p.save('images/parrot_adaptive16_orig.png')
img2_p = img2.convert("P", palette=Image.ADAPTIVE, colors=16)
img2_p.save('images/lena_adaptive16_orig.png')
print(os.stat('images/parrot.jpg').st_size, os.stat('images/parrot_adaptive16_orig.png').st_size)
data = list(img1_p.getdata())
print(data[50:70])
clrs = img1_p.getcolors()
print(len(clrs))
# show up colors and palette
pal1 = img1_p.getpalette()
pal2 = img2_p.getpalette()
p_nzero = pal1[:len(clrs) * 3]
clrs.sort(key=lambda x: x[1]) # sort by color, for showing
for v in [(clrs[i // 3], (p_nzero[i], p_nzero[i + 1], p_nzero[i + 2]))
for i in range(0, len(p_nzero), 3)]:
print(v)
img1_p.putpalette(pal2)
img1_p.save('images/parrot_adaptive16_remapped.png')
img2_p.putpalette(pal1)
img2_p.save('images/lena_adaptive16_remapped.png')
21786 21657 [13, 13, 13, 13, 13, 14, 13, 13, 13, 13, 13, 12, 12, 12, 12, 12, 12, 6, 6, 6] 16 ((8112, 0), (192, 223, 104)) ((11280, 1), (169, 189, 114)) ((8606, 2), (158, 171, 92)) ((11407, 3), (118, 166, 76)) ((4850, 4), (137, 149, 84)) ((6388, 5), (102, 146, 80)) ((14639, 6), (90, 136, 92)) ((12331, 7), (90, 138, 59)) ((10268, 8), (193, 78, 37)) ((3774, 9), (135, 93, 46)) ((12598, 10), (176, 43, 29)) ((10832, 11), (147, 39, 27)) ((11902, 12), (79, 117, 82)) ((8417, 13), (76, 116, 55)) ((11559, 14), (66, 98, 61)) ((7057, 15), (56, 72, 48))

Generating Image with PIL¶
PIL can be used to generate images of the following types:
Mandelbrot set: with the function
effect_mandelbrot(), covering the given extent. The parameters it accepts aresize: the requested size in pixels, as a 2-tuple: (width, height).extent: the extent to cover, as a 4-tuple: (x0, y0, x1, y1).quality: quality of the image.
Gaussian noise of size (256 x 256) centered around (128 x 128): with the function
effect_noise(). The parameters it accepts aresize: the requested size in pixels, as a 2-tuple: (width, height).sigma: standard deviation of noise.
256x256 linear gradient from black to white, top to bottom: with the function
linear_gradient(), accepting the parametermode(input mode).256x256 radial gradient from black to white, centre to edge: witj the function
radial_gradient(), again accepting the parametermode(input mode).
The next code snippet demonstrates the generation of the images.
from PIL.Image import effect_mandelbrot, effect_noise, linear_gradient, radial_gradient
sz = (256, 256)
plt.figure(figsize=(20,8))
plt.imshow(np.hstack((
effect_mandelbrot(size=sz, extent=(-2, -1.5, 1, 1.5), quality=100),
effect_noise(size=sz, sigma=5),
linear_gradient(mode='L'),
radial_gradient(mode='L')
)))
plt.axis('off')
plt.title('generating images with PIL', size=20)
plt.show()
from typing import Type
def apply_fisheye_effect(img: np.ndarray, K: np.ndarray, d: np.ndarray) -> np.ndarray:
indices = np.array(np.meshgrid(range(img.shape[0]), range(img.shape[1]))).T \
.reshape(np.prod(img.shape[:2]), -1).astype(np.float32)
Kinv = np.linalg.inv(K)
indices1 = np.zeros_like(indices, dtype=np.float32)
for i in range(len(indices)):
x, y = indices[i]
indices1[i] = (Kinv @ np.array([[x], [y], [1]])).squeeze()[:2]
indices1 = indices1[np.newaxis, :, :]
in_indices = cv2.fisheye.distortPoints(indices1, K, d)
indices, in_indices = indices.squeeze(), in_indices.squeeze()
distorted_img = np.zeros_like(img)
for i in range(len(indices)):
x, y = indices[i]
ix, iy = in_indices[i]
if (ix < img.shape[0]) and (iy < img.shape[1]):
distorted_img[int(ix),int(iy)] = img[int(x),int(y)]
in_indices = in_indices.astype(int)
#distorted_img = distorted_img[np.min(in_indices[:,0]):np.max(in_indices[:,0]), np.min(in_indices[:,1]):np.max(in_indices[:,1])]
#plt.imsave('images/chess_distorted.png', distorted_img)
return distorted_img
K = np.array( [[338.37324094,0,319.5],[0,339.059099,239.5],[0,0,1]],dtype=np.float32) # camera intrinsic params
d = np.array([0.17149, -0.27191, 0.25787, -0.08054],dtype=np.float32) # k1, k2, k3, k4 - distortion coefficients
img = plt.imread('images/parrot.jpg')
img = img / img.max()
distorted_img = apply_fisheye_effect(img, K, d)
plt.figure(figsize=(12,5))
plt.subplot(121), plt.imshow(img, aspect='auto'), plt.axis('off'), plt.title('original', size=20)
plt.subplot(122), plt.imshow(distorted_img, aspect='auto'), plt.axis('off'), plt.title('distorted', size=20)
plt.tight_layout()
plt.show()
Remove fish-eye distortion¶
# https://stackoverflow.com/questions/67257397/opencv-undistortpoints-doesnt-undistort?noredirect=1&lq=1
# https://docs.opencv.org/4.x/dc/dbb/tutorial_py_calibration.html
# camera parameters, to be computed during callibration
K = np.array( [[338.37324094,0,319.5],[0,339.059099,239.5],[0,0,1]],dtype=np.float64)
D = np.array( [[ 0.01794191], [-0.12190366],[ 0.14111533],[-0.09602948]],dtype=np.float64)
new_size = (640, 480)
Knew = K.copy()
#alpha = 0.6
#Knew, roi = cv2.getOptimalNewCameraMatrix(K, D, new_size, alpha, new_size,centerPrincipalPoint = True)
unfishmap1, unfishmap2 = cv2.fisheye.initUndistortRectifyMap(K, D, np.eye(3), Knew, new_size, cv2.CV_32F)
#img = cv2.imread('images/3FYUT.jpg')
img = plt.imread('images/chess_distorted.png')
img_undistorted = cv2.remap(img, unfishmap1, unfishmap2, interpolation=cv2.INTER_LINEAR, borderMode=cv2.BORDER_CONSTANT)
x, y, w, h = roi
#img_undistorted = img_undistorted[y:y+h, x:x+w]
plt.figure(figsize=(15,7))
plt.subplot(121), plt.imshow(img, aspect='auto'), plt.axis('off'), plt.title('distorted', size=20)
plt.subplot(122), plt.imshow(img_undistorted, aspect='auto'), plt.axis('off'), plt.title('undistorted', size=20)
plt.tight_layout()
plt.show()
with defisheye¶
from defisheye import Defisheye
dtype = 'linear'
format = 'fullframe'
fov = 180
pfov = 120
img = "images/fishy.jpeg"
dtypes = ['linear', 'equalarea', 'stereographic']
cor_imgs = []
for dtype in dtypes:
#img_out = f"images/out/example3_{dtype}_{format}_{pfov}_{fov}.jpg"
obj = Defisheye(img, dtype=dtype, format=format, fov=fov, pfov=pfov)
# To save image locally
#obj.convert(outfile=img_out)
cor_img = obj.convert()
#print(cor_img.shape)
cor_imgs.append(cv2.cvtColor(cor_img, cv2.COLOR_BGR2RGB))
plt.figure(figsize=(15,7))
plt.imshow(np.hstack((plt.imread(img), cor_imgs[0], cor_imgs[1], cor_imgs[2]))), plt.axis('off')
plt.title('Fish eye correction with defisheye', size=20)
plt.show()
Homography - a Geometric Image Transformation¶
Homography refers to a mathematical transformation that maps points in one plane to corresponding points in another plane. In computer vision and image processing, homography is often used to describe the transformation between two images of the same scene taken from different viewpoints or under different perspectives.
A homography matrix (H) is a $3\times 3$ matrix that represents the linear transformation between two planes in projective geometry. It can be used to map the coordinates of points in one image to their corresponding coordinates in another image.
Let's now learn how to implement homogtaphy (perspective image transformation) with opencv-python and skimage.transform module functions (and compare their performance).
with opencv-python¶
The function cv2.getPerspectiveTransform() can be used to compute the perspective transformation matrix. This matrix can be used to perform a perspective transformation on an image. Here are the steps how it can be used to implement homography.
- First You need to have four pairs of corresponding points in the source and destination coordinate spaces. Each pair should be a tuple of $(x, y)$ coordinates, as shown in the next code snippet.
im_src = (imread('images/books.jpg'))
height, width, dim = im_src.shape
src = np.array([[267., 364.],
[312., 683.],
[555., 598.],
[561., 284.]], dtype='float32')
dst = np.array([[ 0., 0.],
[0, height-1],
[width-1, height-1],
[width-1, 0.]], dtype='float32')
def plot_images(src, dst):
plt.figure(figsize=(7,7))
plt.subplot(121), plt.imshow(src), plt.axis('off'), plt.title('Source image', size=15)
plt.subplot(122), plt.imshow(dst), plt.axis('off'), plt.title('Destination image', size=15)
plt.tight_layout()
plt.show()
- Use
cv2.getPerspectiveTransform()to compute the perspective transformation matrix based on the corresponding points. Theperspective_matrixis a 3x3 matrix that represents the perspective transformation. - Once you have the perspective transformation matrix, you can use it to transform coordinates or warp an image, using the function
cv2.warpPerspective().
%timeit M = cv2.getPerspectiveTransform(src, dst)
%timeit im_dst = cv2.warpPerspective(im_src, M, (width, height), flags=cv2.INTER_LINEAR)
#plot_images(im_src, im_dst)
4.22 µs ± 160 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each) 7.21 ms ± 385 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
with scikit-image.transform.warp()¶
The skimage.transform.warp() function can be used to apply geometric transformations to images. It is a versatile function that allows you to specify a transformation function or a precomputed transformation matrix. The function skimage.transform.warp() performs backward transformation, whereas cv2.warpPerspective() demonstrated earlier performs forward transformation.
The function accepts the input (source) image along with an inverse coordinate map inverse_map argument, which transforms coordinates in the output image into their corresponding coordinates in the input image (this can be obtained by taking an inverse of the transforation matrix M).
You can specify the interpolation method used during the transformation by setting the order parameter, the default interpolation method being 'nearest'.
%timeit im_dst = warp(im_src, np.linalg.inv(M), clip=False)
#plot_images(im_src, im_dst)
136 ms ± 17.1 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
with scikit-image.transform.ProjectiveTransform¶
In scikit-image, ProjectiveTransform is a class that represents a 2D projective transformation, which can be used to perform geometric transformations on images, such as rotation, scaling, translation, and skewing. You create an instance of ProjectiveTransform by providing a transformation matrix.
The transformation matrix ($3 \times 3$) can also be estimated using the
estimate()method ofProjectiveTransform. It estimates the transformation matrix $M$ from a set of corresponding points (from the source and destination images). The parameters are determnied with the total least-squares method. Number of source and destination coordinates must match. Finally, use theinverse()method to return a transform object representing the inverse.The function
warp_persective()in the next code snippet (defined with type annotations) again implements the homography.It first estimates the
inverse_mapfrom the set of matching pointssrc_ptsanddst_ptsfrom the sources and destination images, respectively.Then it uses the inverse transformation to obtain the source indices corresponding to the destination images (it also clips the coordinates).
Finally, the destination images are obtained by selecting the pixels from the source image using the source indices obtained. Effectively, it implements the
warp()function.
The next code snippet demonstrates how the function works, with the following figure representing the input image warped to produce the output as shown.
def warp_projective(src_pts: np.ndarray, dst_pts: np.ndarray, im_src: np.ndarray) -> np.ndarray:
src, dst = src_pts[:, [1, 0]], dst_pts[:, [1, 0]]
pt = ProjectiveTransform()
pt.estimate(src, dst)
x, y = np.mgrid[:height, :width]
dst_indices = np.hstack((x.reshape(-1, 1), y.reshape(-1,1)))
src_indices = np.round(pt.inverse(dst_indices), 0).astype(int)
valid_idx = np.where((src_indices[:,0] < height) & (src_indices[:,1] < width) & (src_indices[:,0] >= 0) & (src_indices[:,1] >= 0))
dst_indicies_valid = dst_indices[valid_idx]
src_indicies_valid = src_indices[valid_idx]
im_dst = np.zeros_like(im_src)
im_dst[dst_indicies_valid[:,0],dst_indicies_valid[:,1]] = im_src[src_indicies_valid[:,0],src_indicies_valid[:,1]]
return im_dst.astype(np.uint8)
%timeit im_dst = warp_projective(src, dst, im_src)
plot_images(im_src, im_dst)
149 ms ± 25.5 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
Flip images¶
Flipping an image involves reversing the order of pixels along a particular axis, either horizontally or vertically. This operation is akin to looking at the image in a mirror or turning it upside down. This geometric transformation can be implemented using the function cv2.flip() from the library opencv-python. This function accepts the input image. along with a filpCode, which is an integer specifying the direction of flip, as described below.
- A horizontal flip (
flipCode=0) involves reversing the order of pixels along the vertical axis. Pixels on the left side of the image swap places with pixels on the right side. - A vertical flip (
flipCode=1) involves reversing the order of pixels along the horizontal axis. Pixels at the top of the image exchange positions with pixels at the bottom. - Both Horizontal and Vertical Flip (
flipCode=-1): This operation involves both a horizontal and vertical flip, effectively turning the image 180 degrees.
Image Manipulation with scipy.ndimage module¶
We can use the nbdimage module from the library scipy too for image manipulation, the next
section shows couple of examples.
Applying an Affine Transformation to an Image¶
We can use the function affine_transform() and provide the 3 x 3 transformation matrix (in
homogeneous coordinates) and the offset, to carry out the transformation, as shown in the
next code block.
im = imread("images/parrot.png")
transformed = affine_transform(im, [[np.cos(np.pi/4),np.sin(np.pi/4), 0],[-np.sin(np.pi/4),np.cos(np.pi/4), 0], [0,0,1]],
offset=[-im.shape[0]/2+75, im.shape[1]/2-50, 0], output_shape=im.shape)
plt.imshow(transformed), plt.axis('off')
plt.show()
5.5.2. Zooming an image¶
The next code block demonstrates how to use the zoom() function from the scipy.ndimage
module to zoom in and then crop a portion of image using numpy ndarray slicing. The next figure shows the output of the code block.
im = imread('images/parrot.jpg') / 255
zoomed_im = ndimage.zoom(im, (2,2,1), mode='nearest', order=1) # no zoom on color channel, order of the spline interpolation = 1
print(im.shape, zoomed_im.shape)
plt.figure(figsize=(10,5))
plt.subplot(121), plt.imshow(im,aspect='auto'), plt.title('Original Image', size=20)
plt.subplot(122), plt.imshow(zoomed_im[150:400,350:650,:], aspect='auto') # crop the enlarged face
plt.title('Zoomed and Cropped Image', size=20)
plt.tight_layout()
plt.show()
(340, 453, 3) (680, 906, 3)
Interpolating an Image¶
Now, you will see how the same function ndimage.zoom() can be used to zoom the small clock image twice with spline interpolation of different orders. The next figure plots the outputs obtained and also shows that output image with interpolation of order $4$ spline interpolation obtains a much better apprximation of the resized image (much sharper, hence contains much more details) than the one obtained with order $1$.
im = imread('images/clock.png')
n = 2
interp_images = []
for k in range(1,5):
zoomed_im = ndimage.zoom(im, (n,n,1), mode='nearest', order=k) # no zoom on color channel, order of the spline interpolation = 1
interp_images.append(zoomed_im)
interp_images.append(interp_images[-1]-interp_images[0])
plt.figure(figsize=(20,7))
plt.imshow(np.hstack(interp_images)), plt.axis('off')
plt.title('output with spline interpolation of order 1,2,3,4 and the diff between the last and the first interpolated image', size=20)
plt.show()
Applying Wave distortion to an image¶
im = imread('images/lena.jpg', True)
x,y = np.meshgrid(np.float32(np.arange(im.shape[1])),np.float32(np.arange(im.shape[0])))
y = y + 20*np.sin(x/15)
distorted = ndimage.map_coordinates(im, [y.ravel(),x.ravel()])
plt.imshow(distorted.reshape(im.shape)), plt.axis('off')
plt.show()
Image Manipulation with wand¶
Wand is a Python binding for ImageMagick, a software suite for creating, editing, composing, or converting images.
!python -VV
Python 3.8.16 (default, Jan 17 2023, 22:25:28) [MSC v.1916 64 bit (AMD64)]
import os
os.environ['MAGICK_HOME']
'C:\\Program Files\\ImageMagick-7.1.1-Q16-HDRI'
Creating Visual Effects¶
If you're looking to create visual effects using Wand (Python's ImageMagick binding), you can follow these general steps:
# https://docs.wand-py.org/en/latest/guide/install.html#install-imagemagick-on-windows
# https://imagemagick.org/script/download.php#windows
# https://docs.wand-py.org/en/0.5.9/guide/fx.html#blue-shift
from wand.image import Image as ImageW
from wand.color import Color
def call_func(img, func_name):
img = img.clone()
func = getattr(img, func_name)
arg_dict = functions[func_name]
func(**arg_dict)
return img
functions = {'spread': {'radius':16.0}, 'blue_shift': {'factor':2},
'charcoal': {'radius':2, 'sigma':1}, 'colorize': {'color': 'magenta', 'alpha':"rgb(10%, 50%, 20%)"},
'tint': {'color': 'magenta', 'alpha':"rgb(50%, 70%, 80%)"}, 'vignette': {'sigma':3, 'x':15, 'y':15},
'sepia_tone': {'threshold': 0.9}, 'implode': {'amount':0.5},
'polaroid': {}, 'swirl': {'degree': -120}, 'wave': {'amplitude': 10, 'wave_length':50},
'shade': {'gray':True, 'azimuth':100.0, 'elevation':45.0}, 'white_balance': {},
'cycle_color_map': {'offset': 50}
}
plt.figure(figsize=(15,9))
plt.gray()
plt.subplots_adjust(0,0,1,0.925,0.05,0.05)
with ImageW(filename="images/parrot.png") as img:
plt.subplot(3,5,1), plt.imshow(np.array(img)), plt.axis('off'), plt.title('original', size=20)
i = 2
for func_name in functions:
out_img = call_func(img, func_name)
plt.subplot(3,5,i), plt.imshow(np.array(out_img)), plt.axis('off'), plt.title(func_name, size=20)
i += 1
plt.suptitle('special effects with wand', size=25)
plt.show()
Sketch, solarize and fx_fliter¶
with ImageW(filename="images/parrot.png") as img:
img.transform_colorspace("gray")
img.sketch(0.6, 0.1, 95.0)
with ImageW(filename="images/parrot.png") as img:
img.solarize(threshold=0.5 * img.quantum_range)
fx_filter="(hue > 0.895 || hue < 0.095) ? u : lightness"
with ImageW(filename="images/parrot.png") as img:
filtered_img = img.fx(fx_filter)

Distorting and Undistorting Images¶
Distorting and undistorting images are common tasks in computer vision and image processing. Distortion can occur due to various factors, such as the lens used to capture the image. Undistorting is the process of correcting these distortions. The most common type of distortion is radial distortion, which causes straight lines to appear curved. Here's a general overview of how to distort and undistort images:
Distortion¶
- Choose a Distortion Model:
The most common distortion model is radial distortion, which can be modeled using the Brown-Conrady model. This model includes parameters like radial distortion coefficients and tangential distortion coefficients. 2. Compute Distortion: Use the distortion model parameters to compute the distorted coordinates of each pixel in the image. 3. Apply Distortion: Map the original pixel coordinates to the distorted coordinates using the computed distortion. This will result in a distorted image.
Undistortion¶
- Calibrate Camera:
Capture images of a known calibration pattern (e.g., checkerboard) at different orientations and distances. Use these images to calibrate the camera and estimate distortion parameters. 2. Undistort Images: Apply the distortion parameters to undistort the image. This process involves mapping the distorted pixel coordinates to undistorted coordinates. 3. Use a Library or Framework: Many computer vision libraries and frameworks, such as OpenCV in Python, provide functions to calibrate cameras and undistort images.
In this section, we shall simulate a few distortion models using wand.Image methods and apply inverse distorion to get the original image back (assuming the parameters for the distortion model known). However, if the paramaters are not known, we need to first estimate the parameters (e.g., using camera calibration), which we shall discuss in later chapters.
Barrel, Polar, Arc and Shepards distortion¶
Let's simulate a few different types of geometric distortions that can affect images. Each type of the following distortions has its own characteristics and mathematical models to describe how it distorts an image.
Barrel Distortion is a type of radial distortion that causes straight lines to curve outward, resembling the shape of a barrel (creates outwards bump), it can be modeled using polynomial equations, and the distortion increases with the distance from the center of the image.
Polar Distortion, also known as polar transformation or polar warping, involves transforming an image from Cartesian coordinates to polar coordinates. This can result in a circular or radial distortion effect. The mathematical model involves converting Cartesian coordinates (x, y) to polar coordinates (r, θ) and vice versa.
Arc Distortion involves distorting an image by bending or warping it along a curved path, creating an arched effect.
The mathematical model for arc distortion depends on the specific form of the desired arc or curve.
- Shepard's Distortion, also known as inverse distance weighting (IDW) interpolation, is a spatial interpolation technique. It can be used to distort an image based on the inverse of the distances to neighboring points. Shepard's method uses a mathematical formula to assign weights to neighboring points based on their distances, and the distortion is determined by interpolating values using these weights.
Here we shall use the method distort() from wand.Image class to apply the above distortions on a given house input image (by passing the distortion names, e.g., barrel, polar etc., along with the other relevant arguments, e.g., rac angle and rotate angle for arc distortion) and plot the corresponding distorted output images obtained, as shown in the next figure, using the following code snippet.
with ImageW(filename='images/house.png') as im:
w, h = im.size
im.virtual_pixel = 'transparent' # 'black'
im_out_barrel = im.clone()
im_out_barrel.distort('barrel', (0.2, 0.0, 0.0, 1.0))
im_out.save(filename='images/house_distorted.png')
im_out_polar = im.clone()
im_out_polar.distort('polar', (0,))
im_out_arc = im.clone()
im_out_arc.distort('arc', (270, 45)) # ArcAngle, RotateAngle
im_out_arc.resize(w,h)
im_out_shepards = im.clone()
im_out_shepards.artifacts['distort:viewport'] = "260x260-10+10"
im_out_shepards.artifacts['shepards:power'] = "4.0"
im_out_shepards.distort('shepards', (0, 0, 20, 30, 150, 146, 60, 70))
im_out_shepards.resize(w,h)
plt.figure(figsize=(15,7))
plt.imshow(np.hstack((np.array(im), np.array(im_out_barrel),
np.array(im_out_arc), np.array(im_out_polar), np.array(im_out_shepards))))
plt.axis('off'), plt.title('The original and the distorted images (barrel, arc, polar, shepards)', size=20)
plt.show()
Correct distortion by Undistorting¶
You can undo a barrel distortion by applying a barrel_inverse distortion (which produces an inward bump, to the contrary), using the following code snippet.
with ImageW(filename='images/house_distorted.png') as im:
im_out_cor = im.clone()
im_out_cor.distort('barrel_inverse', (0.2, 0.0, 0.0, 1.0))
plt.figure(figsize=(9,4))
plt.imshow(np.hstack((np.array(im), np.array(im_out_cor))))
plt.axis('off'), plt.title('The distorted and corrected images', size=15)
plt.show()
Drawing Convex Hull¶
The method Image.convex_hull() finds the smallest convex polygon, and returns a list of (corner) points of the polygon. In the next code snippet we first obtain the set of points defining the convex hull of the objects in the image using the method convex_hull() and then draw a polygon (with red boundary) using those points using the polygon() method.
from wand.drawing import Drawing
with ImageW(filename='images/blocks.png') as img:
points = img.convex_hull()
with Drawing() as ctx:
ctx.fill_color = 'transparent'
ctx.stroke_color = 'red'
ctx.polygon(points=points)
ctx(img)
plt.imshow(img)
Image Manipulation with matplotlib¶
We can use the pylab module from the library matplotlib for image manipulation, the next
section shows an example.
Drawing Contour lines for an image¶
A contour line for an image is a curve connecting all the pixels where they have the same particular (constant) value. If we represent the image as a function $f(x,y)$, where $x$ and $y$ are the spatial coordinates, a contour line corresponds to a set of points $(x,y)$ such that $f(x,y)=constant$.
The following code snippet shows how to draw the contour lines for the gray-scale image of Einstein, using the function
contour(), from the modulematplotlib.pylab.The constant levels are specified using the function
np.linspace(), which creates an array with 15 values, evenly spaced between 0 and 255 (inclusive), to be used aslevels(constants) for the contour plot.Contour lines can be styled and colored using various parameters such as colors,
cmap(colormap, hereplt.cm.hotis used), andlinewidths. You can fill the regions between contours usingplt.contourf().You can add labels to the contours using the
plt.clabel()function. This adds labels to the contours at specified locations (as can be seen from the following figures, the levels 18.2, 36.4, 91.1 are added as labels, these values were generated previsouly usingnp.linspace()).
#im = color.rgb2gray(imread("images/einstein.jpg")) # read the image from disk as a numpy ndarray
import matplotlib
im = imread("images/einstein.jpg") # read the image from disk as a numpy ndarray
plt.figure(figsize=(10,5))
plt.subplot(131), plt.imshow(im, cmap='gray', aspect='auto'), plt.axis('off'), plt.title('Original Image', size=20)
plt.subplot(132)
cs = plt.contour(np.flipud(im), levels=np.linspace(0,255,15), cmap=plt.cm.hot)
plt.clabel(cs, fontsize=15, inline=True)
plt.axis('off')
plt.title('Image Contour Lines', size=20)
plt.subplot(133), plt.title('Image Filled Contour', size=20), plt.contourf(np.flipud(im), cmap='inferno'), plt.axis('off')
plt.tight_layout()
plt.show()
Compare PIL.ImageOp and skimage.measure module's vs. opencv-python library's implementation of contour plotting, with the code snippet below.
im = Image.open("images/einstein.jpg")
im_ctr = im.filter(ImageFilter.CONTOUR)
im = cv2.imread("images/einstein.jpg")
im_gray = cv2.cvtColor(im, cv2.COLOR_BGR2GRAY)
_, thresh = cv2.threshold(im_gray, 127, 255, 0)
contours, hierarchy = cv2.findContours(image=thresh, mode=cv2.RETR_TREE, method=cv2.CHAIN_APPROX_SIMPLE)
cv2.drawContours(image=im_gray, contours=contours, contourIdx=-1, color=(0, 0, 0), thickness=2, lineType=cv2.LINE_AA)
im = imread("images/einstein.jpg", True) # read the image from disk as a numpy ndarray
im = im / im.max()
contours = measure.find_contours(im, 0.5) #, fully_connected="high")
plt.figure(figsize=(12,6))
plt.gray()
plt.subplot(131), plt.imshow(im_ctr, aspect='auto'), plt.axis('off'), plt.title('PIL', size=20)
plt.subplot(132), plt.imshow(im_gray, aspect='auto'), plt.axis('off'), plt.title('opencv-python', size=20)
plt.subplot(133), plt.imshow(im, aspect='auto', alpha=0.25), plt.axis('off'), plt.title('skimage', size=20)
for contour in contours:
plt.plot(contour[:, 1], contour[:, 0], linewidth=2)
plt.tight_layout()
plt.show()
Summary¶
In this chapter we first provided a basic introduction on image processing and basic concepts on the problems that we try to solve in image processing. We then discussed about different tasks / steps in image processing and about the leading image processing libraries in python that we are going to use for coding in this book. Next we talked about how to install different libraries for image processing in python, how to import them and call the functions from the modules. We also provided the basic concepts about the image type, file format and data structures to store image data with different python libraries. Then we discussed about how to do image I/O and display in python using different libraries. Finally, we discussed about how to do basic image manipulations with different python libraries. In the next chapter we shall deep dive into the sampling, quantization, convolution, Fourier transform and frequency domain filtering on images.
Further Readings / References¶
- Digital Image Processing, a book by Rafael C. Gonzalez and Richard E. Woods for image processing concepts.
- https://web.stanford.edu/class/ee368/handouts.html
- https://ocw.mit.edu/resources/res-2-006-girls-who-build-cameras-summer-2016/
- http://scikit-image.org/docs/dev/api/skimage.html
- https://pillow.readthedocs.io/en/3.1.x/reference/Image.html
- https://docs.scipy.org/doc/scipy-1.1.0/reference/ndimage.html
- https://matplotlib.org/gallery
- http://members.cbio.mines-paristech.fr/~nvaroquaux/formations/scipylecture-notes/advanced/image_processing/index.html
- http://www.scipy-lectures.org/
- https://irsa.ipac.caltech.edu/applications/FinderChart/docs/color_enhance.html
- https://web.cs.wpi.edu/~emmanuel/courses/cs545/S14/slides/lecture09.pdf
- http://www.eie.polyu.edu.hk/~enyhchan/imagef.pdf
- http://eeweb.poly.edu/~yao/EL5123/lecture12_ImageWarping.pdf
- https://stackoverflow.com/questions/61765758/how-to-apply-fisheye-effect-on-a-normal-image-using-opencv-fisheye-module-in-pyt/76925752#76925752
- https://www.sciencedirect.com/science/article/abs/pii/S0262885620300573#:~:text=The%20zero%2Dsum%20game%20structure,considered%20as%20a%20conflict%20patch